Action recognition by saliency-based dense sampling

نویسندگان

  • Zengmin Xu
  • Ruimin Hu
  • Jun Chen
  • Chen Chen
  • Huafeng Chen
  • Hongyang Li
  • Qingquan Sun
چکیده

Action recognition, aiming to automatically classify actions from a series of observations, has attracted more attention in the computer vision community. The state-of-the-art action recognition methods utilize dense sampled trajectories to build feature representations. However, their performances are limited due to action region clutters and camera motions in real world applications. No matter how the scenario changes in different backgrounds, the salient cues of actions are highly dependent on their appearances and motions. Based on this discovery, in this paper we propose a novel saliency-based dense sampling strategy named improved dense trajectories (iDT) on salient region-based contrast boundary (iDT-RCB). Without any external human detector, a robust mask is generated to overcome the limitations of global contrast based saliency in action sequences. Warped optical flow is exploited to adjust the interest points sampling to remove subtle motions. We show that an appropriate pruning of feature points can achieve a good balance between saliency and density of the sampled points. Experiments conducted on three benchmark datasets have demonstrated the effectiveness of the proposed method. More specifically, the fusion of deep-learned features and our hand-crafted features can even improve the recognition performance over baseline dense sampling methods. In particular, the fusion scheme achieves the state-of-the-art accuracy at 73.8% and 94.8% on Hollywood2 and UCF50, respectively.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Space-Variant Descriptor Sampling for Action Recognition Based on Saliency and Eye Movements

Algorithms using “bag of features”-style video representations currently achieve state-of-the-art performance on action recognition tasks, such as the challenging Hollywood2 benchmark [1,2,3]. These algorithms are based on local spatiotemporal descriptors that can be extracted either sparsely (at interest points) or densely (on regular grids), with dense sampling typically leading to the best p...

متن کامل

Dense v.s. Sparse: A Comparative Study of Sampling Analysis in Scene Classification of High-Resolution Remote Sensing Imagery

Scene classification is a key problem in the interpretation of high-resolution remote sensing imagery. Many state-of-the-art methods, e.g. bag-of-visual-words model and its variants, the topic models as well as deep learning-based approaches, share similar procedures: patch sampling, feature description/learning and classification. Patch sampling is the first and a key procedure which has a gre...

متن کامل

Compressed-Sampling-Based Image Saliency Detection in the Wavelet Domain

When watching natural scenes, an overwhelming amount of information is delivered to the Human Visual System (HVS). The optic nerve is estimated to receive around 108 bits of information a second. This large amount of information can’t be processed right away through our neural system. Visual attention mechanism enables HVS to spend neural resources efficiently, only on the selected parts of the...

متن کامل

Determining Patch Saliency Using Low-Level Context

The increased use of context for high level reasoning has been popular in recent works to increase recognition accuracy. In this paper, we consider an orthogonal application of context. We explore the use of context to determine which low-level appearance cues in an image are salient or representative of an image’s contents. Existing classes of low-level saliency measures for image patches incl...

متن کامل

Bottom-up Attention Improves Action Recognition Using Histograms of Oriented Gradients

When recognizing others’ action, we pay attention to their body parts and/or objects they are manipulating rather than observing their whole body movement. Bottom-up saliency is a promising cue to determine where to attend and hence to identify what the persons are doing because their body parts acting on objects become more conspicuous when contributing to the action. This paper proposes an ar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Neurocomputing

دوره 236  شماره 

صفحات  -

تاریخ انتشار 2017